Sustainable Retrieval Tiers for Filecoin

Authors: Miroslav Bajtoš (miroslav@meridian.space)

Last Update:

🚨

This document describes my initial thoughts as I started to explore the problem space. Based on the feedback from others, I decided to slightly change the scope and continue this work in .

Introduction

Filecoin retrievals are a long-standing problem that has not been solved yet.

The original Filecoin Whitepaper briefly describes the concept of a Retrieval Providers acting as a
gateway between clients downloading data and Storage Providers (SPs) persisting the content, with micro-payment channels used to pay for the retrieval service in a trustless settings. There was significant funding and effort put into realising this vision during 2021-2024 era (Magmo's State Channels, Titan network, Saturn dCDN, and more), but the reality is that Filecoin retrievals remain an unsolved problem.

The current status quo is that SPs are expected to serve retrievals as part of their storage deals.
Retrievals are essentially a public goods service relying on altruism, with no clarity about what
quality of service is expected or required from the Storage Providers, and with no upper bound on
the costs incurred by the SPs when serving retrievals. It's not surprising that SPs are reluctant to
serve retrievals under such conditions, and that the adoption of Filecoin is suffering from a lack
of reliable retrievals.

While there are ongoing efforts to implement retrievals in a more sustainable way (Filecoin Warm
Storage, FilCDN), it will take several quarters before such new protocols are ready for production
use and widely adopted.

To bridge the gap until sustainable retrieval solutions are available, I propose to introduce a set
of retrieval tiers based on the current reality and the needs of emerging solutions like FilCDN.

Notably, the introduction of paid tiers enables us to significantly scale back expectations for the quality of service in the free-public-access tier, making this tier sustainable and the pay-for-access tiers more attractive.

Goals

  • Change the perception of retrievals in the Filecoin ecosystem, make it a standard expectation to pay for high-quality retrieval services.
  • Introduce the concept of multiple retrieval service tiers that enable SPs to configure their operations to provide different performance levels and maintain control over their costs.
  • Preserve the current model of free retrievals as a public goods service.
  • Ensure the clients paying for Onchain Cloud/WarmStorage deals can retrieve their data back with a better performance than the general public.
  • Define baseline expectations for quality of retrieval service provided by SPs, with different
    levels depending on the retrieval tier. Allow SPs to configure their operations to meet the
    defined QoS while keeping their costs under control.
  • Introduce a minimal framework that allows SPs to charge for retrieval services, enabling the
    ecosystem to experiment with different payment models and protocols.
  • Make paid retrievals an attractive proposition to builders by limiting the expected quality of
    free retrievals, thereby creating space for more performant retrieval services as paid offerings.

Out of Scope

  • Retrieval services for data stored in PoRep deals.
  • Enforcement of the retrieval tiers and their QoS.
  • Registry of SPs and the paid retrieval services they integrate with.
  • Mechanisms allowing clients to pick SPs based on SP’s SLIs.
  • Standardised payment models or protocols.
  • Abuse prevention mechanism for the free tier.

Proposed Tiers

TierQuality of ServiceStorage Client CostsRetrieval Client Costs
Free Public AccessLowSmall premium added to the deal price (e.g. $X/TiB stored).None.
Owner AccessMediumAlready included in the storage deal price.None.
Paid Retrieval ServiceBestNone.Custom pricing in the Retrieval Service agreement. (E.g. FilCDN charges $X/TiB of egress consumed.)

Free Public Access

The Public Access Tier transforms the current status quo of SPs serving retrievals as a public goods service into an opt-in paid option for storage deals.

Storage clients must request public access in the storage deal metadata and pay a higher storage fee to cover the costs of providing public access to the stored data.

With that, anyone can retrieve public data stored by the Filecoin SP participating in the deal, with no authorisation or payment required.

  • Authorisation: No authorisation; anyone can retrieve public data stored with the SP.
  • QoS: No guarantees; SPs are expected to serve retrievals on a best-effort basis and encouraged to apply strict limits.

    Recommended target for SPs (to be discussed): rate limit of 1 req/minute (per IP address), TTFB under 1 second, 10 MB/s bandwidth in total, 90% availability.

    This allows SPs to configure their operations to keep their costs under control while providing free access to the content they store.

  • Payment: The payment for public retrievals is covered by the premium added to the storage deal fee paid by the storage client; no additional payments are required from the client retrieving the data.
    SPs are expected to absorb the costs of serving retrievals as part of their storage service.
    In the initial version, the premium could be set to zero, in which case SPs would provide public retrievals as a contribution to the Filecoin commons, just as they do today.
  • Enforcement: No enforcement; SPs are expected to serve retrievals on a best-effort basis.

Tuning parameters:

  • QoS expectations - downstream bandwidth, availability (retrieval success rate), rate limits, egress quotas, etc.
  • The fee premium for deals with public access to the data.

Owner Access

It's crucial to ensure that clients storing data with Filecoin can retrieve their data back. This becomes even more important after public access becomes an opt-in paid feature with lower-quality service.

The data owners should receive a higher level of service than the general public, providing that they authenticate themselves as the payers of the storage deal.

  • Authorisation: The retrieval request must be signed by the wallet that paid for the storage
    deal. The SP can verify the signature and allow or deny the retrieval request.

    We should also implement a delegation scheme allowing the wallet owner to grant access to fast retrievals to other entities (wallets).

    Technical details will be specified later.

  • QoS: SPs are expected to serve retrievals for the data owners with a decent level of service that’s better than the Public Access Tier.

    Example parameters for further discussion: rate limit of 1 req/sec (per storage client), 10 MB/s bandwidth per client or 100 MB/s bandwidth per SP, TTFB under 500ms, 99% availability, 10x StoredBytes total egress traffic per month (per client), etc.

    It’s important to enforce egress quotas to prevent Owners from sharing their access with the general public.

    We need to perform further research and discussion to define the minimal QoS expectations
    acceptable to the community of Filecoin SPs and clients.

  • Payment: The payment for retrievals is included in the storage deal fee; no additional payments are required from the client retrieving the data. The SPs are expected to absorb the costs of serving retrievals as part of their storage service.

    In the current pricing model, storage clients pay a fixed USDFC amount per TiB stored per month. This means that owner-authorised retrievals are also paid per TiB stored, not per TiB retrieved.

  • Enforcement: No enforcement. Storage clients can choose a different SP if they are not
    satisfied with the quality of the retrieval service.

Tuning parameters:

  • QoS expectations - downstream link, availability (retrieval success rate), egress quota, etc.
  • Storage deal pricing - what's the reasonable price to cover the costs of serving retrievals in the quality defined for this tier?

Paid Retrieval Service

The Paid Retrieval Service creates a framework allowing SPs to charge for a high-quality retrieval
service. The details of such offerings are out of scope of this document. At the beginning, we expect SPs and the ecosystem to experiment with different payment models and protocols, which can be later standardised and implemented as built-in components of the SP and client software.

  • Authorisation: The retrieval request must include a valid access token.

    The access token format and protocols for obtaining and verifying the access tokens are out of scope of this proposal. We want to keep the access token format flexible, allowing the ecosystem to experiment with different approaches.

    For example, FilCDN can create an off-chain deal with an SP, where the SP agrees to meet FilCDN's requirements on QoS for cache-miss requests, FilCDN agrees to compensate the SP for egress consumed, and as part of this deal, the SP issues an access token that FilCDN can use for retrieving the content via the fast lane reserved to paying clients.

    Another option is to use the same wallet-signature mechanism as described in the Owner Access tier.

  • Payment: The payment mechanisms are out of scope of this proposal. We expect the ecosystem to explore different approaches before settling on a set of recommended options.

    For example, FilCDN may use Filecoin OnChain Cloud's payment rails with egress-based metering to establish a payment mechanism where the client paying for the CDN service transfers a fraction of the service fee to the SP based on the cache-miss egress consumption reported by the CDN.

  • QoS: QoS requirements are out of scope of this proposal. In the initial exploration phase, we
    expect SPs and parties paying for retrieval services to negotiate the SLAs.

    For example, FilCDN may require SPs to offer 100 MB/s bandwidth, TTFB under 300ms, 99.9% availability, with the promise that FilCDN will pay a fixed amount per GB of egress traffic served by the SP.

  • Enforcement: Enforcement is out of scope of this proposal. We expect the ecosystem to
    experiment with different trust models and different approaches to enforcing SLAs.

    For example, FilCDN may require the SP to trust the metrics collected by the CDN layer, and let the payment rails validator slash the payments to the SP if the SP fails to meet the SLA.

Conclusion

Implementing retrieval tiers will transform Filecoin from a network where retrievals are a second-class feature with unclear expectations about the quality of service into a compelling data
platform with a viable model for both storage and retrievals. The proposed tiered approach balances SP operational needs with client expectations, and the minimal framework allows for experimentation with payment models and protocols.

Next Steps

  • Submit this proposal as an FIP/FRC. (Maybe?)
  • Discuss the proposal with the Filecoin community and reach consensus about the QoS and pricing for the Public Access tier.
  • Discuss the proposal with the FS (Onchain Cloud) working group and reach consensus about the initial parameters (QoS, pricing) for the Owner Access tier.
  • Create a design document for implementing these three tiers in Curio for PDP (WarmStorage) deals.
  • Implement all three tiers in Curio for PDP (WarmStorage) deals.
  • Start a discussion about implementing these tiers for PoREP deals.

Potential Future Enhancements

Measure the QoS of the Public Access tier

Implement checker probes, such as Filecoin Spark, to monitor the QoS of individual SPs and build reputation data to influence clients' choices of SPs.

Crowd-sourced Reputation based on QoS of Owner Access tier

Implement crowd-sourced reputation data based on metrics reported by retrieval clients (e.g., Synapse SDK), similar to how people crowd-source ratings on Google Maps.

Abuse Prevention for Public Access Tier

Free tiers are vulnerable to abuse (DDoS, bandwidth exhaustion). We can implement built-in abuse prevention mechanisms into Curio, such as rate limiting and IP blocking.

Allow SPs to differentiate based on QoS and price

Allow SPs to differentiate their service by offering better QoS or price for Public Access and Owner Access tiers, or by offering several different levels (QoS + price) within these tiers.

Prior Work

On-chain Retrieval Expectations

FIP discussion: https://github.com/filecoin-project/FIPs/pull/862

This proposes a set of retrieval SLA ‘tiers’. These tiers identify the ‘category’ of data - if it
is fully offline, meant for low-volume-retrieval archival usage, or for higher bandwidth activity.
The consensus part of this FIP is a proposal to encode the retrieval SLA tier as part of a deal
proposal. Standardizing retrieval expectations in this way allows storage providers to apply
appropriate policies and pricing to deals.
TierOfflineArchivalOnlinePublic
Example of dataLarge datasets onboarded with physical disksLong term private data storage & backupsResearch DatasetsFiles, Documents, Images
Amortized anticipated retrieval volume01x / month1x / week1x / day
Expected resilience to burstinessn/a2x3x5x
User-expected Latencyn/a< 6hr< 1min< 5sec

The FIP proposal in PR#862 discusses retrieval expectations for PoRep deals and defines categories of data based on access patterns (archival vs. online), but it does not address pricing mechanisms or the trade-offs between price and quality of service.

In our proposal, we approach the problem from the perspective of warm-storage (online) deals, focusing on designing a framework that addresses the needs of the three most common personas interested in retrieval services while considering the needs of SPs for running a sustainable business.